AITopics | near-optimal reinforcement learning

Collaborating Authors

near-optimal reinforcement learning

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Near-Optimal Reinforcement Learning in Dynamic Treatment Regimes

Neural Information Processing SystemsDec-25-2025, 15:52:04 GMT

A dynamic treatment regime (DTR) consists of a sequence of decision rules, one per stage of intervention, that dictates how to determine the treatment assignment to patients based on evolving treatments and covariates' history. These regimes are particularly effective for managing chronic disorders and is arguably one of the key aspects towards more personalized decision-making. In this paper, we investigate the online reinforcement learning (RL) problem for selecting optimal DTRs provided that observational data is available. We develop the first adaptive algorithm that achieves near-optimal regret in DTRs in online settings, without any access to historical data. We further derive informative bounds on the system dynamics of the underlying DTR from confounded, observational data. Finally, we combine these results and develop a novel RL algorithm that efficiently learns the optimal DTR while leveraging the abundant, yet imperfect confounded observations.

dynamic treatment regime, name change, near-optimal reinforcement learning, (5 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.65)

Add feedback

Near-Optimal Reinforcement Learning with Self-Play

Neural Information Processing SystemsDec-23-2025, 19:23:02 GMT

This paper considers the problem of designing optimal algorithms for reinforcement learning in two-player zero-sum games. We focus on self-play algorithms which learn the optimal policy by playing against itself without any direct supervision. In a tabular episodic Markov game with S states, A max-player actions and B min-player actions, the best existing algorithm for finding an approximate Nash equilibrium requires \tlO(S^2AB) steps of game playing, when only highlighting the dependency on (S,A,B). In contrast, the best existing lower bound scales as \Omega(S(A+B)) and has a significant gap from the upper bound. This paper closes this gap for the first time: we propose an optimistic variant of the Nash Q-learning algorithm with sample complexity \tlO(SAB), and a new Nash V-learning algorithm with sample complexity \tlO(S(A+B)). The latter result matches the information-theoretic lower bound in all problem-dependent parameters except for a polynomial factor of the length of each episode. In addition, we present a computational hardness result for learning the best responses against a fixed opponent in Markov games---a learning objective different from finding the Nash equilibrium.

algorithm, name change, near-optimal reinforcement learning, (5 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.86)

Add feedback

Near-Optimal Reinforcement Learning in Dynamic Treatment Regimes

Neural Information Processing SystemsOct-3-2025, 02:52:06 GMT

An alternative is to randomize patients' treatments at each stage based on the previous decisions and observed outcomes; for instance, one popular strategy is known as the sequential multiple assignment randomized trail (SMART) [

artificial intelligence, machine learning, reinforcement learning, (17 more...)

Neural Information Processing Systems

Country: North America > United States (0.28)

Genre: Research Report > Experimental Study (0.68)

Industry: Health & Medicine > Therapeutic Area > Oncology (0.69)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.85)

Add feedback

Near-optimal Reinforcement Learning in Factored MDPs

Ian Osband, Benjamin Van Roy

Neural Information Processing SystemsOct-2-2025, 17:47:54 GMT

Neural Information Processing Systems http://nips.cc/

factored mdp, near-optimal reinforcement learning

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.40)

Add feedback

Near-optimal Reinforcement Learning in Factored MDPs

Neural Information Processing SystemsSep-30-2025, 08:26:11 GMT

Any reinforcement learning algorithm that applies to all Markov decision processes (MDPs) will suffer $\Omega(\sqrt{SAT})$ regret on some MDP, where $T$ is the elapsed time and $S$ and $A$ are the cardinalities of the state and action spaces. This implies $T = \Omega(SA)$ time to guarantee a near-optimal policy. In many settings of practical interest, due to the curse of dimensionality, $S$ and $A$ can be so enormous that this learning time is unacceptable. We establish that, if the system is known to be a \emph{factored} MDP, it is possible to achieve regret that scales polynomially in the number of \emph{parameters} encoding the factored MDP, which may be exponentially smaller than $S$ or $A$. We provide two algorithms that satisfy near-optimal regret bounds in this context: posterior sampling reinforcement learning (PSRL) and an upper confidence bound algorithm (UCRL-Factored).

factored mdp, name change, near-optimal reinforcement learning, (4 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.89)

Add feedback

Near-optimal Reinforcement Learning in Factored MDPs

Ian Osband, Benjamin Van Roy

Neural Information Processing SystemsFeb-8-2025, 17:11:03 GMT

Any reinforcement learning algorithm that applies to all Markov decision processes (MDPs) will su er (Ô SAT) regret on some MDP, where T is the elapsed time and S and A are the cardinalities of the state and action spaces. This implies T = (SA) time to guarantee a near-optimal policy. In many settings of practical interest, due to the curse of dimensionality, S and A can be so enormous that this learning time is unacceptable. We establish that, if the system is known to be a factored MDP, it is possible to achieve regret that scales polynomially in the number of parameters encoding the factored MDP, which may be exponentially smaller than S or A. We provide two algorithms that satisfy near-optimal regret bounds in this context: posterior sampling reinforcement learning (PSRL) and an upper confidence bound algorithm (UCRL-Factored).

Add feedback

Reviews: Near-Optimal Reinforcement Learning in Dynamic Treatment Regimes

Neural Information Processing SystemsJan-25-2025, 04:37:35 GMT

Update after rebuttal: Due to author comments and, in particular, discussions with the other reviewers, I have updated my score from 4 to a weak accept 6. For the future draft, aside from the revisions and clarifications the authors have promised in the rebuttal, I recommend the following (slight) modifications to improve the manuscript: The motivation in the introduction would be strengthened by drawing clearer connections to the real world. The authors should consider picking a specific real world example and illustrating the method through that example (even if it's not possible to provide simulation results on such an example). In line with this, the authors should be careful about discussion of safe-RL. Typically such methods involve use of constraints to ensure safety, but it does not appear the authors explicitly use or discuss such methods here.

author comment, dynamic treatment regime, near-optimal reinforcement learning, (9 more...)

Neural Information Processing Systems

Genre: Summary/Review (0.35)

Industry: Health & Medicine (0.55)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.40)

Add feedback

Reviews: Near-Optimal Reinforcement Learning in Dynamic Treatment Regimes

Neural Information Processing SystemsJan-25-2025, 04:37:25 GMT

In this paper, the authors provide a method for incorporating observational data (possibly subject to unobserved confounding) to improve the performance of policy learning in online settings (crucial theorems are 5,7 and 8). After a period of discussion, the reviewers came to a consensus that this paper merits publication in NeurIPS, and will contribute to the RL literature by giving a principled method of incorporating observational data, even if confounded.

dynamic treatment regime, near-optimal reinforcement learning, observational data

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.40)

Add feedback

Review for NeurIPS paper: Near-Optimal Reinforcement Learning with Self-Play

Neural Information Processing SystemsJan-22-2025, 01:41:51 GMT

Additional Feedback: *) Is there a reason to mention algorithm 1? it seems algorithm 2 gives improved performance relatively to it. If so, why presenting the two algorithms and not just algorithm 2? *) Although equation 9 can be thought of as a set of n m linear constraints, why the optimization problem is always feasible? Although the authors devoted half a page to explain on this procedure, I feel it is not well explained. Most of the discussion is not devoted to explaining the policy certification procedure. Why for a fixed \mu the best response is not markovian?

near-optimal reinforcement learning, neurips paper, policy certification procedure, (3 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.40)

Add feedback

Review for NeurIPS paper: Near-Optimal Reinforcement Learning with Self-Play

Neural Information Processing SystemsJan-22-2025, 01:41:44 GMT

After reading the reviews and authors' responses, it seems the only main concern raised is the lack of experiments. My opinion is that while experiments would be nice to have, the lack of experiments is not a significant concern if the theoretical results are strong enough. In my own assessment of the paper, I find the theoretical results to be indeed quite a strong contribution to the field (they provide the first algorithm to match the PAC lower bound, for a problem which has quite a few previous works). The reviewers seem to agree with this point in their reviews. I, therefore, recommend that the paper be accepted.

experiment, near-optimal reinforcement learning, neurips paper, (2 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.40)

Add feedback